Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The Bible as a parallel corpus : Annotating the "book of 2000 tongues"

Identifieur interne : 000374 ( Main/Exploration ); précédent : 000373; suivant : 000375

The Bible as a parallel corpus : Annotating the "book of 2000 tongues"

Auteurs : P. Resnik [États-Unis] ; M. B. Olsen [États-Unis] ; M. Diab [États-Unis]

Source :

RBID : Francis:524-99-12218

Descripteurs français

English descriptors

Abstract

We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">The Bible as a parallel corpus : Annotating the "book of 2000 tongues"</title>
<author>
<name sortKey="Resnik, P" sort="Resnik, P" uniqKey="Resnik P" first="P." last="Resnik">P. Resnik</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Maryland</region>
<settlement type="city">College Park (Maryland)</settlement>
</placeName>
<orgName type="university">Université du Maryland</orgName>
</affiliation>
</author>
<author>
<name sortKey="Olsen, M B" sort="Olsen, M B" uniqKey="Olsen M" first="M. B." last="Olsen">M. B. Olsen</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Maryland</region>
<settlement type="city">College Park (Maryland)</settlement>
</placeName>
<orgName type="university">Université du Maryland</orgName>
</affiliation>
</author>
<author>
<name sortKey="Diab, M" sort="Diab, M" uniqKey="Diab M" first="M." last="Diab">M. Diab</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Maryland</region>
<settlement type="city">College Park (Maryland)</settlement>
</placeName>
<orgName type="university">Université du Maryland</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">524-99-12218</idno>
<date when="1999">1999</date>
<idno type="stanalyst">FRANCIS 524-99-12218 INIST</idno>
<idno type="RBID">Francis:524-99-12218</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000074</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000055</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000056</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">000056</idno>
<idno type="wicri:doubleKey">0010-4817:1999:Resnik P:the:bible:as</idno>
<idno type="wicri:Area/Main/Merge">000401</idno>
<idno type="wicri:Area/Main/Curation">000374</idno>
<idno type="wicri:Area/Main/Exploration">000374</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">The Bible as a parallel corpus : Annotating the "book of 2000 tongues"</title>
<author>
<name sortKey="Resnik, P" sort="Resnik, P" uniqKey="Resnik P" first="P." last="Resnik">P. Resnik</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Maryland</region>
<settlement type="city">College Park (Maryland)</settlement>
</placeName>
<orgName type="university">Université du Maryland</orgName>
</affiliation>
</author>
<author>
<name sortKey="Olsen, M B" sort="Olsen, M B" uniqKey="Olsen M" first="M. B." last="Olsen">M. B. Olsen</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Maryland</region>
<settlement type="city">College Park (Maryland)</settlement>
</placeName>
<orgName type="university">Université du Maryland</orgName>
</affiliation>
</author>
<author>
<name sortKey="Diab, M" sort="Diab, M" uniqKey="Diab M" first="M." last="Diab">M. Diab</name>
<affiliation wicri:level="4">
<inist:fA14 i1="01">
<s1>Department of Linguistics and Institute for Advanced Computer Studies, University of Maryland</s1>
<s2>College Park, MD 20742</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Maryland</region>
<settlement type="city">College Park (Maryland)</settlement>
</placeName>
<orgName type="university">Université du Maryland</orgName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Computers and the humanities</title>
<title level="j" type="abbreviated">Comput. humanit.</title>
<idno type="ISSN">0010-4817</idno>
<imprint>
<date when="1999">1999</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Computers and the humanities</title>
<title level="j" type="abbreviated">Comput. humanit.</title>
<idno type="ISSN">0010-4817</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Alignment</term>
<term>Computational linguistics</term>
<term>Corpus annotation</term>
<term>Electronic text</term>
<term>Parallel corpus</term>
<term>TEI</term>
<term>Translation</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Linguistique informatique</term>
<term>Annotation de corpus</term>
<term>Texte électronique</term>
<term>Traduction</term>
<term>Alignement</term>
<term>Recherche linguistique</term>
<term>Encodage</term>
<term>Bible</term>
<term>TEI</term>
<term>Corpus parallèle</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Traduction</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Maryland</li>
</region>
<settlement>
<li>College Park (Maryland)</li>
</settlement>
<orgName>
<li>Université du Maryland</li>
</orgName>
</list>
<tree>
<country name="États-Unis">
<region name="Maryland">
<name sortKey="Resnik, P" sort="Resnik, P" uniqKey="Resnik P" first="P." last="Resnik">P. Resnik</name>
</region>
<name sortKey="Diab, M" sort="Diab, M" uniqKey="Diab M" first="M." last="Diab">M. Diab</name>
<name sortKey="Olsen, M B" sort="Olsen, M B" uniqKey="Olsen M" first="M. B." last="Olsen">M. B. Olsen</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000374 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000374 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Francis:524-99-12218
   |texte=   The Bible as a parallel corpus : Annotating the "book of 2000 tongues"
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024